Efficient implementation of a multiplier/accumulator with load

ABSTRACT

This invention is multiply-accumulate circuit supporting a load of the accumulator. During multiply-accumulate operation a partial product generator forms partial produces from the product inputs. An adder tree sums the partial product and the accumulator value. The sum is stored back in the accumulator overwriting the prior value. During load operation an input gate forces one of the product inputs to all 0&#39;s. Thus the partial product generator generates partial products corresponding to a zero product. The adder tree adds this zero product to the external load value. The sum, which corresponds to the external load value is stored back in the accumulator overwriting the prior value. A multiplexer at the side input of the adder tree selects the accumulator value for normal operation or the external load value for load operation.

CLAIM OF PRIORITY

This application claims priority under 35 U.S.C. 119(e)(1) to U.S. Provisional Application No. 62/078,166 filed Nov. 11, 2014.

TECHNICAL FIELD OF THE INVENTION

The technical field of this invention is multiply-accumulators.

BACKGROUND OF THE INVENTION

A digital signal processor (DSP) instruction set typically include multiply-accumulate instructions which are execute in dedicated hardware. This multiply-accumulate hardware implements the function:

Acc<=Acc+(X*Y).

As defined above the multiply-accumulate instruction forms the product of two operands X and Y and adds their product to the value stored in an accumulator. The sum is stored in the accumulator overwriting the previous value.

It is desirable to have load or move instructions having this accumulation register (Acc) as a destination. Implementing this function typically employs a multiplexer between data from the multiply-accumulate result and the load/move data. The prior art places this multiplexer just before the Acc register. This places the multiplexer on the most critical path. This critical path is the data flow from the X/Y source registers, through the multiply and add operations to the Acc register.

The multiplexer placement creates a problem. By being in the critical path, the multiplexer slows operation of the multiply-accumulate function. The prior art placement of the multiplexer causes all operations to slow, even ordinary multiply-accumulation. This limits the clock rate that can be employed potentially slowing all data processor operation.

SUMMARY OF THE INVENTION

This invention is multiply-accumulate circuit supporting a load of the accumulator. During multiply-accumulate operation a partial product generator forms partial produces from the product inputs. An adder tree sums the partial product and the accumulator value. The sum is stored back in the accumulator overwriting the prior value.

During load operation an input gate forces one of the product inputs to all 0's. Thus the partial product generator generates partial products corresponding to a zero product. The adder tree adds this zero product to the external load value. The sum, which corresponds to the external load value is stored back in the accumulator overwriting the prior value.

A multiplexer at the side input of the adder tree selects the accumulator value for normal operation or the external load value for load operation. This placement of the multiplexer is out of the critical path during normal operation. Thus this circuit may operate faster than the prior art that includes the multiplexer between the adder tree and the accumulator.

BRIEF DESCRIPTION OF THE DRAWINGS

These and other aspects of this invention are illustrated in the drawings, in which:

FIG. 1 illustrates a multiply-accumulator having a load accumulator operation with the prior art multiplexer placement;

FIG. 2 illustrates the multiply-accumulator according to this invention; and

FIG. 3 illustrates an alternate input according to this invention.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

FIG. 1 illustrates a multiply-accumulator 100 having a load accumulator operation according to the prior art. Multiply-accumulator 100 includes input register 101 which receives and stores a first input operand X. Multiply-accumulator 100 includes input register 102 which receives and stores a second input operand Y. The sources of X and Y could be data from registers in a register file, data recalled from memory or external inputs. The sources of X and Y are not relevant to this invention.

In accordance with the prior art the multiply-accumulate operation is performed in hardware by the combination of partial product generator 103 and adder tree 104. The data stored in input register 101 and input register 102 are supplied to respective inputs of partial product generator 103. Partial product generator 103 generates a set of partial products from the inputs X and Y. Adder tree 104 sums these partial products with the proper shifts to form the product. Adder tree 104 also receives the current data stored in accumulator 106. Adder tree 104 adds the current data from accumulator 106 to the product data by summing the accumulator data with the partial products.

The accumulated product output of adder tree 104 supplies one input of multiplexer 105. A second input of multiplexer 105 receives external source data. This external source data is used to initialize or load accumulator 106. The control input to multiplexer 105 determines the data selected. During ordinary multiply-accumulate operations, multiplexer 105 selects data from adder tree 104 for storage in accumulator 106. During load operations, multiplexer 105 selects data from the external source for storage in accumulator 106. The selected output of multiplexer 105 is stored in accumulator 106 overwriting any prior data.

The placement of multiplexer 105 illustrated in FIG. 1 creates a problem. Multiplexer 105 introduces additional gate delay in the critical path. Note that every operation of multiply-accumulator 100 involves the gate delay of multiplexer 105. Multiply-accumulator 100 must be clocked at a lower frequency to accommodate this additional gate delay relative to a non-load multiply-accumulator. This limits the clock rate that can be employed potentially slowing all data processor operation.

FIG. 2 illustrates multiply-accumulator 200 according to this invention. During multiply-accumulate operation multiply-accumulator 200 operates in the same manner as multiply-accumulator 100. Multiply-accumulator 200 includes input register 201 which receives and stores a first input operand X and input register 203 which receives and stores a second input operand Y. Note during normal operations AND gate 202 receives The data stored in input register 201 and input register 203 are supplied to respective inputs of partial product generator 204. Adder tree 205 sums the partial products with the proper shifts to form the product. Adder tree 205 also receives the current data stored in accumulator 206 (via multiplexer 207 further described below). Adder tree 205 adds the current data from accumulator 206 to the product data by summing the accumulator data with the partial products. The output of adder tree 201 is stored in accumulator 206 overwriting any prior data.

A load operation employs AND gate 202 and multiplexer 207. AND gate 202 includes the same structure for each bit of the second operand Y. A first input of AND gate 202 receives the operand Y_(i). A second inverting input of AND gate 202 receives a control signal ZERO. During normal (multiply-accumulate) operation control signal ZERO is all 0's. Due to the inverting action of the inverting input, all bits Y_(i) are passed unchanged to be stored in input register 203. During load operation control signal ZERO is all 1's. Thus all bits of operand Y are blocked and the output of AND gate 202 is all 0's. This zero input is stored in input register 203 and then passed to partial product generator 204. The resulting partial products resolve all 0's. This all zero set of partial products is supplied to adder tree 204.

Multiplexer 207 selects the second input to adder tree 205 as specified by the control input. During multiply-accumulate operation multiplexer 207 selects the current contents of accumulator 206. During load operation multiplexer 207 selects the external data source. Because the output of partial product generator 204 is always zero due AND gate 202 zeroing one of its inputs during load operations, the sum of adder tree 205 is the external data. Adder tree 206 supplies this external data of accumulator 206 for storage.

This invention puts the multiplexer in the path from the accumulator register to the adder. This path is less critical than the path from the X/Y registers. The data from accumulator 206 can be fed into adder tree 205 at a later point than the partial products from partial product generator 204. AND gate 202 forces the partial products to zero, so that tree adder 205 passes the external data to accumulator register 206. This removes the multiplexer gate delay from the critical path in the multiplier-accumulator. Accordingly, the DSP is permitted to operate at a higher clock frequency.

FIG. 3 illustrates an alternative embodiment of this invention. In FIG. 3 AND gate 303 is disposed between input register 302 and partial product generator 204 rather than between the source of Y and input register 203 as illustrated in FIG. 2. The circuit of FIG. 3 operates as previously described. This circuit passed either operand Y or all 0's to the corresponding input of partial product generator 204.

Those skilled in the art would realize that the AND gate need not be in the second operand path. This circuit would operate equally well with the AND gate in either operand path. Those skilled in the art would realize the recited inverting input to the AND gate depends upon the sense of the control signal ZERO. It is feasible to employ a non-inverting input to the AND gate if control signal ZERO was all 1's for multiply-accumulate operation and all 0's for load operation. 

What is claimed is:
 1. A multiply-accumulate circuit comprising: a bit-wise AND gate having a plurality of bits, each bit having a first input receiving a corresponding input operand bit, a second input receiving a control signal ZERO and an output; a partial product generator having a first input receiving a first operand and a second input connected to said output of said a bit-wise AND gate, said partial product generator forming partial products of data supplied to said first and second inputs at an output; an adder tree having a first input connected to said partial product generator output and a second input, said adder tree forming a sum of said partial products and data supplied to said second input at an output; an accumulator having an input connected to said output of said adder tree for receiving and storing said sum and an output for supplying data stored therein; and a multiplexer having a first input connected to said output of said accumulator and a second input receiving external data, said multiplexer supplying a selected one of data on said first input or on said second input to an output connected to said second input of said adder tree.
 2. The multiply-accumulate circuit of claim 1, wherein: upon a multiply-accumulate operation said ZERO signal causes said bit-wise AND gate to pass each corresponding input operand bit unchanged, and said multiplexer supplying said first input to said adder tree; and upon a load operation said ZERO signal causes said bit-wise AND gate to pass 0 for each corresponding input operand bit, and said multiplexer supplying said second input to said adder tree.
 3. The multiply-accumulate circuit of claim 1, further comprising: a first input register having an input receiving and storing a first input operand and an output supplying data stored therein to said first input of said partial product generator; and a second input register having an input receiving and storing said output of said bit-wise AND gate and an output supplying data stored therein to said second input of said partial product generator.
 4. The multiply-accumulate circuit of claim 1, further comprising: a first input register having an input receiving and storing a first input operand and an output supplying data stored therein to said first input of said partial product generator; and a second input register having an input receiving and storing a second input operand and an output supplying each bit of data stored therein to said first input of corresponding bit of said bit-wise AND gate.
 5. A method of multiplication-accumulation comprising the steps of: supplying a first operand to a partial product generator; supplying a selected one of a second operand or all 0's to the partial product generator; employing the partial product generator to form partial products of supplied data; supplying the partial products of the partial product generator to an adder tree; supplying a selected one of data stored in an accumulator register or external sourced data to the adder tree; employing the adder tree to form a sum of said partial products and data supplied; storing the sum of the adder tree in the accumulator register.
 6. The method of multiplication-accumulation of claim 5, wherein: upon a multiply-accumulate operation supplying said second operand to the partial product generator, and supplying said data stored in the accumulator register to the adder tree; and upon a load operation supplying all 0's to the partial product generator, and supplying said external sourced data to the adder tree.
 7. The method of multiplication-accumulation of claim 5, further comprising: temporarily storing the first operand before supply to the partial product generator; and temporarily storing the selected second operand or all 0's before supply to the partial product generator.
 8. The method of multiplication-accumulation of claim 5, further comprising: temporarily storing the first operand before supply to the partial product generator; and temporarily storing the second operand before selecting one of the second operand or all 0's. 